Exhaustive whole-genome tandem repeats search
نویسندگان
چکیده
MOTIVATION Approximate tandem repeats (ATR) occur frequently in the genomes of organisms, and are a source of polymorphisms observed in individuals, and thus are of interest to those studying genetic disorders. Though extensive work has been done in order to identify ATRs, there are inherent limitations with the current approaches in terms of the number of pattern sizes that can be searched or the size of the input length. RESULTS This paper describes (1) a new algorithm which exhaustively finds all variable-length ATRs in a genomic sequence and (2) a precise description of, and an algorithm to significantly reduce, redundancy in the output. Our ATR definition is parameterized by a mismatch ratio p which allows for more mismatches in longer tandem repeats (and fewer in shorter). Furthermore, our algorithm is embarrassingly parallel and thus can attain near-linear speed-up on Beowulf clusters. We present results of our algorithm applied to sequences of widely differing lengths (from genes to chromosomes). AVAILABILITY Source and binaries are available on request.
منابع مشابه
An appraisal of the potential for illegitimate recombination in bacterial genomes and its consequences: from duplications to genome reduction.
An exhaustive search for shortly spaced repeats in 74 bacterial chromosomes reveals that they are much more numerous than is usually acknowledged. These repeats were divided into five classes: close repeats (CRs), tandem repeats (TRs), simple sequence repeats (SSRs), spaced interspersed direct repeats, and "others." CRs are widespread and constitute the most abundant class, particularly in codi...
متن کاملTandem repeats over the edit distance
MOTIVATION A tandem repeat in DNA is a sequence of two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats occur in the genomes of both eukaryotic and prokaryotic organisms. They are important in numerous fields including disease diagnosis, mapping studies, human identity testing (DNA fingerprinting), sequence homology and population studies. Although tandem repea...
متن کاملFiltering Tandem Repeats in DNA Sequences
A tandem repeat is a sequence of two or more contiguous, approximate copies of a pattern. Tandem repeats occur in the genomes of both eukaryotic and prokaryotic organisms. They are important in numerous fields including disease diagnosis, mapping studies, human identity testing (DNA fingerprinting), sequence homology, and population studies. Although tandem repeats have been used by biologists ...
متن کاملGenomic abundance is not predictive of tandem repeat localization in grass genomes
Highly repetitive regions have historically posed a challenge when investigating sequence variation and content. High-throughput sequencing has enabled researchers to use whole-genome shotgun sequencing to estimate the abundance of repetitive sequence, and these methodologies have been recently applied to centromeres. Previous research has investigated variation in centromere repeats across euk...
متن کاملVNTR9 and VNTR10, two newly-found variable-number tandem repeat loci useful in MLVA genotyping of Bordetella pertussis
Background & Aims: Bordetella pertussis, the causative agent of whooping cough, continues to infect human hosts even in those populations where infants and children are routinely vaccinated. Causes of pertussis epidemiology are not fully identified unless strains of the pathogen are characterized by molecular means. Golbally, Multi Locus Variable Number of Tandem Repeats analysis (MLVA) has pro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 20 16 شماره
صفحات -
تاریخ انتشار 2004